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It is unclear how to think about trust or to model its ebb 
and flow. Is there some sort of Second Law of Thermo¬ 
dynamics of trust, where trust starts high and is dissi¬ 
pated over time? Or is it the contrary, that trust starts 
low and can grow through a series of good experiences? 
Is it more complex, and how can the waxing and waning 
be thought about? 
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Abstract 

The diverse views of science of security have opened up several alleys 
towards applying the methods of science to security. We pursue a different 
kind of connection between science and security. This paper explores the 
idea that security is not just a suitable subject for science, but that the 
process of security is also similar to the process of science. This similarity 
arises from the fact that both science and security depend on the methods 
of inductive inference. Because of this dependency, a scientific theory can 
never be definitely proved, but can only be disproved by new evidence, 
and improved into a better theory. Because of the same dependency, every 
security claim and method has a lifetime, and always eventually needs to 
be improved. 

In this general framework of security-as-science, we explore the ways 
to apply the methods of scientific induction in the process of trust. The 
process of trust building and updating is viewed as hypothesis testing. We 
propose to formulate the trust hypotheses by the methods of algorithmic 
learning, and to build more robust trust testing and vetting methodologies 
on the solid foundations of statistical inference. 


1 Introduction 

The effort towards science of security was born from the need for a more sys¬ 
tematic approach to security It resulted in new empiric and 

experimental approaches to cyber security [29J [30;. The fact that science of 
security still means many things to many people should perhaps be seen as a 
feature and not a bug, since already security on its own means many things to 
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many people, and it is natural that they study it from many directions [34| . On 
the other hand, it seems that each step of scientific progress requires a unifying 
idea, each of them showing that a certain group of trees is actually a forest [20]. 
What is then the unifying idea of science of security? 

1.1 Science is something else 

1.1.1 What science is not 

Every known civilization seems to have developed technology, art, and religion. 
But only the Western Civilization has developed science. Science emerged in 
Europe during the Renaissance, and caused the Industrial Revolution. This 
unique stream of events is analyzed in some detail in |20] 0 

There are, of course, many definitions of science. Some of them are shaped 
to include the teachings of Ron L. Hubbard; some to include marxism, or even 
the daily thoughts of the current leader of North Korea. Most definitions, 
however, point to some of the features of the methodological movement that 
led to understanding the natural processes like heat, electricity, magnetism, 
radiation, or networking. Although the notion of science can be extended to 
include astrology, Scientology, theology, mathematics, or engineering, it does 
not seem useful to stretch it too much. Assigning the status of science, say, 
to the engineering principles and processes (whether those that enabled the 
public works of Ancient Egypt, or those that emerged in medieval alchemy, or 
in Renaissance architecture, or in modern software engineering) might conceal 
something essential about science. Can science be reduced to its technological 
thrust [22]? Or does it boil down to the view that the world is governed by a 
system of laws [38]? Or is there more to it? 

Many ancient civilizations developed the quantitative methods that enabled 
them to plan and execute extensive engineering projects, and change the land¬ 
scapes of their environment. Many of them also explained the world around 
them through sophisticated theoretical edifices and that included the ’Laws of 
Nature’, formalized as mythologies, or gathered in sacred texts, often equipped 
with extensive symbolic systems. But no one until the Age of Science came any¬ 
where near to understanding and reproducing, e.g., the thermo-nuclear processes 
of Sun; or the space-time curvature, without which our GPS systems could not 
surf on the geodesics, and would keep sending us wrong coordinates. No one 
before science came anywhere near to understanding genomics and to engineer¬ 
ing the basic processes of life; and nowhere near to connecting our world into a 
network of networks, and spanning a distance-free space, where every two nodes 

Ht has been objected that this view can be construed as eurocentric. While the word 
’’science” can, of course, be used to denote many things, as explained in the next paragraph, 
theory of science defines science as the movement that led to the Industrial Revolution. Since 
the fact that the Industrial Revolution emerged in Europe is historically uncontestable, the 
fact that science emerged in Europe follows from this definition. Moreover, as incisive critics 
of the Industrial Revolution even before its current destructive consequences became clear, the 
theorists of science can hardly be accused of praising Europe for being the cradle of science 
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are neighbors, and where our joint problem solving, and problem creating capa¬ 
bilities seem to be reaching a completely new level. This network of networks is 
what we call cyber space. Inhabited by the processes that we programmed, but 
whose interactions we cannot control, cyber space hosts a new nature in need 
of a new science. Is this new level of our civilization just another new level of 
yet another civilization, or is it something else? Is the science that brought it 
all about just another way that we found to generate new technologies, or just 
another religion that tells us the laws of the world, or is it something essentially 
new? 

There is a qualitative difference between the science-generated technologies, 
and the spontaneously evolved technologies. There is also a qualitative differ¬ 
ence between the symbolic systems of theologies and mythologies that emerge 
from religions, and the symbolic systems of mathematics and computation that 
underlie science. There is a qualitative difference between the religious rituals 
on one hand and the scientific protocols on the other. The essence of these 
differences is not in the levels of complexity or effectiveness. There are complex 
religious systems, and there are simple scientific theories. Many religions and 
even superstitions postulate their 'Laws of Nature ’ that are structurally indis¬ 
tinguishable from those postulated by science. Astrology and phrenology have 
in their time been tested as scientific theories, by scientific methods, and rejected 
not for structural reasons, not as unscientific, but as wrong. And there are also 
effective religious systems, and there are ineffective sciences. E.g., although the 
processes of photosynthesis are everywhere around us, at the bottom of all of 
our food chains, science has remained unable to understand what do the plants 
really do when they bind photons into sugars. There is a quantum effect, but 
science has been ineffective in explaining it. It has also been less effective than 
most religions in addressing people’s emotional and social needs. 

So what really distinguishes scientific theories, if it is not complexity, and 
not effectiveness? 

1.1.2 What science is 

I propose to consider the logical pattern of inductive inference as the essence of 
science: While religion claims to provide the truth, science only seeks to disprove 
false hypotheses @ 

In a formal sense, science is the quest for disproving theories. This for¬ 
mal sense was fully implemented for the first time in Ronald Fisher’s practical 
methods of scientific inference mm, and then analyzed theoretically in Karl 
Popper’s extensive and influential work [36] . The historic support for this view 
of science was provided by Thomas Kuhn 20], while the scientists themselves 
provided some of the most compelling examples from their current practices [12] . 
Other leading templates of scientific inference (e.g. the Neyman-Pearson testing 
[25] , or Bayesian inference mm may appear to offer ways beyond this negative 
logic of science, as the quest for merely improving scientific theories through 

2 There are, of course, many other ways to characterize science. The claim here is that this 
one is useful for the purposes of science of security. 
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disproving false hypotheses. But a closer look shows that they only formalize 
the task of hypothesis selection, and thus support formation of new theories, not 
proving. They do not provide a method to definitely prove anything. Richard 
Feynman announced this with compelling simplicity in his lectures on ’The 
Character of Physical Law ’ m- 

If we have a definite theory, from which we can compute the conse¬ 
quences which can be compared with experiment, then in principle 
we can prove that theory wrong. But notice that we can never 
prove it right. Suppose that you invent a theory, calculate the con¬ 
sequences, and discover every time that the consequences agree with 
the experiment. The theory is then right? No, it is simply not 
proved wrong! In the future you could compute a wider range of 
consequences, there could be a wider range of experiments, and you 
might then discover that the thing is wrong. [... ] — We never are 
definitely right; we can only be sure when we are wrong. 

This is perhaps the best kept secret of science: Science does not provide persis¬ 
tent theories; it only provides methods to disprove and improve our hypotheses. 

1.2 Security is like science 

The fact that the process of security is of the same type like the process of 
science can be illustrated by translating Feynman’s statement from the language 
of science to the language of security: 

If we have a precisely defined security claim about a system, from 
which we can derive the consequences which can be tested, then 
in principle we can prove that the system is insecure. But we can 
never prove that it is secure. Suppose that you design a system, 
calculate some security claims, and discover every time that the 
system remains secure under all tests. The system is then secure? 

No, it is simply not proved insecure! In the future you could refine 
the security model, there could be a wider range of tests and attacks, 
and you might then discover that the thing is insecure. — We never 
are definitely secure; we can only be sure when we are insecure. 

A scientific approach to security must therefore begin with the realization 
that there is no persistent security. Cryptographers have known for a long time 
that every key has a lifetime. It is time that we recognize that every security 
claim has a lifetime. The designers of protocols and systems have, of course, 
accumulated a lot of empiric evidence about this phenomenon [32]. The point 
is to understand it as a logical phenomenon. 

Upon the admission that theories cannot be definitely proved, but only dis¬ 
proved and improved, science has gained its current unparalleled power to har¬ 
ness nature. Upon the realization that security guarantees cannot be definitely 
assured, but only broken and strengthened, science of security will gain the 
ability to tap its power to protect from the same methodological source. 
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1.3 Zoom on trust 


In this paper we focus on the scientific approaches to a special family of security 
claims: the statements of trust. While a general security claim says that a key 
K is uncompromised, or that a protocol P guarantees an authentic channel, a 
statement of trust says that Alice trusts the key K for use in a particular cipher, 
or that Bob trusts the protocol P to establish an authentic channel with Alice. 
A statement of trust is thus a security statement bound to two subjects and an 
object: who trusts what to whom. The parallel between the security processes 
of trust building and the scientific methods of hypothesis testing seems like a 
particularly good illustration of the general logical link of security and science, 
so we pursue it in the rest of this paper. 

Outline of the paper 

In Sec. [2] we briefly explain the concept of trust used in the paper, and why is it 
interesting to model the process of trust as hypothesis testing. In Sec. [3] we show 
on toy examples how to apply the three standard methods of statistical inference 
in trust testing. In Sec. 0] we show how to formulate the best trust hypotheses 
a priori , since it is notoriously difficult to extract the normal behavioral profiles 
from empiric data. In Sec. [5] we comment about the relations of the presented 
ideas with the other views of trust, and with the application of statistics in 
intrusion detection. 


2 Trust as hypothesis testing 

2.1 What is trust? 

Security analyses often begin with the assumptions that some of the subjects 
are honest , i.e. that they behave according to some prescribed protocol rules, 
whereas the others are dishonest, and launch attacks. Trust internalizes the 
honesty assumptions into beliefs of subjects about each other. E.g., we say 
that Alice trusts Bob if she believes that he will behave honestly, according to 
some protocol agreed implicitly or explicitly. In such a trust statement, Alice 
is the trustor , and Bob is the trustee. In social and electronic networks, and on 
the web, trust is implemented in a variety of ways: as feedback services in web 
commerce, as the web of trust or certificate authorities in the various versions of 
Public Key Infrastructure, etc. The underlying trust models often include trust 
ratings , which quantify trust, and the entrusted concepts , which qualify trust. 
A survey of the models of trust used in computer security research can be found 
in [ X8] . Dynamics of the trust processes in network computation were analyzed 
in HU US IMj, and the problem of trust was introduced in the framework of 
science of security in |16j . 
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2.2 Inductive inference of trust 

Just like science can never settle but has to keep testing its theories and re¬ 
fining its hypotheses, trust can also never settle and needs to keep testing its 
hypotheses. Just like a scientific theory can always turn out to be wrong, trust 
can always be broken. The reasoning about such ongoing processes goes under 
the name of inductive logics, which is quite different, and much less familiar 
than deductive logics. The central problem of the inductive inference of trust is 
expressed by the central principle of the modern court of law, i.e. the principle 
of due process: that the accused must be presumed innocent until proven guilty 
[55] . But this is just the legal form of a more general social principle of trust: 
that people should be trusted until proven untrustworthy. The burden of proof is 
here on the prosecution, or on the accusers. The dual principle of ordeal , typical 
of medieval trials, places the burden of proof on the defense, and requires that 
the accused be presumed guilty until proven innocent. The corresponding social 
maxim is the principle of distrust (or suspicion), namely that people should 
be trusted only if they are proven trustworthy. These two views of trust, the 
optimistic and the pessimistic one, correspond to the two social functions of 
trust: 

• to support stable social links based on cumulative trust: ”1 trust you 
because I know you” 

• to enable new social links through a leap of trust: ”1 trust you although I 
don’t know you” 

Note that both the trust principle and the suspicion principle are asserted in a 
logical process akin to science: they are hypotheses that need to be tested. The 
logical parallel described in the Introduction emerges again: just like a scientific 
theory can always be disproved by a new experiment, but can never be definitely 
proven, trust can always be broken, and can never be settled. We just follow 
this parallel. 

2.3 How to trust methodically? 

The scientific method is the method for hypothesis testing through empiric 
validation. This means that a scientific theory can only be validated on a finite 
number of samples or instances, since the empiric data are always finite. Hence 
the asymmetry of inductive inference: while a counterexample can definitely 
disprove a theory, no amount of experience can definitely prove it. This is 
where the problem of induction emerges [21] . 

Statistical methods have been developed as tools for deciding when to reject 
a hypothesis [HE], and also which alternative hypothesis to endorse |27l [28] . 
In the experimental setting, statistical methods moreover allow testing multiple 
hypotheses and quantifying their likelihood [3]. 
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2.4 How many trust values? 

Up to the point where the trust decisions need to be made, trust can be quan¬ 
tified in many ways, reflected to some extent by the trust ratings, as mentioned 
in Sec. m There may be many colors, shades, and nuances of trust, in-between 
trusting and not trusting. At the end of the day, though, a trust decision must 
be extracted: Will the trustor trust the trustee enough to enter into the entrusted 
transaction? At the moment of decision, all previous considerations are reduced 
to one of the two answers: yes or no. This simple outcome is not only the pro¬ 
cess requirement of trust, akin to the process requirement of justice, where the 
verdict of guilty or not guilty must be extracted from whatever mixture of subtle 
and dubious concerns may precede it. More importantly, the final trust decision 
is in principle also the only observable manifestation of trust. The rich models 
of trust are our theories, attempting to explain the unobservable causes of the 
trust decisions. With such theories, science always does the same thing: it tests 
them as hypotheses, and decides whether they should be rejected or not yet. 
The good news is that the trust process seems similar. The bad news is that 
the yes-no decisions are not simple. 

In Sec. [3] we sketch how the basic statistical methodologies apply to trust 
decisions, i.e. how the trust hypotheses can be tested scientifically. In the 
subsequent Sec. [I] we discuss a harder problem of trust science, that does not 
yield to the standard methodologies: how to formulate the trust hypotheses for 
testing. 


3 Testing trust hypotheses 

Suppose that you are interacting with a system S presented by a set of ob¬ 
servable behaviors B. Depending on the ongoing observations of the system 
behaviors, you must make decisions whether to entrust the system with some 
critical or security sensitive operations. For instance, if S is a computational 
device, then B can consist of the various computational behaviors: it may run 
fast or slowly, it may crash or spontaneously restart, it may show high or low 
CPU load, frequent or intermittent network accesses, various power usage be¬ 
haviors, etc. If S is a closed network or a large organization, then the observable 
behaviors B may consist of the various network phenomena, such as local load 
imbalances, clustering and community formations, network chatter or its ab¬ 
sence, and so on. If S is a market segment or a network of contractors, then B 
consists of the various market behaviors: clear or unclear market positions and 
strategies, pricing drift, shifts in supply or demand, overt or covert information 
propagation. In all cases, it is interesting to assume that the observable be¬ 
haviors conceal some ultimately unobservable causes: the computational device 
may have a firmware virus or a hidden hardware component; the organization 
may be penetrated by undetectable moles, or bubbling with defectors; the mar¬ 
ket may be manipulated by a colluding cluster, or swayed by hidden incentives. 

— Science offers methods to detect the unobservable causes of some observable 
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phenomena. 

The observations of the observable behaviors B are modeled by a real func¬ 
tion / : B —>• R, which is often called a statistic. A statistic may list the raw 
measurements of a sample, but it more often displays some property, e.g. the 
mean, the deviation, a higher-order moment, or some other combination of data. 

One thing that a statistic does not display is a distribution of the behaviors 
in B. The distribution of the behaviors, i.e. how often does a behavior b £ B 
come about in a system 5, is what a scientific analysis attempts to induce from 
the observations. More precisely, a scientific analysis proceeds by 

(1) setting a hypothesis 9, presented by a probability distribution Prg : B 
[0,1], and then 

(2) testing whether the statistic / : B —> R supports or disproves the hypoth¬ 
esis 9. 

In the context of trust, the probability distribution Prg : B —>• [0,1] is intended 
to capture the trust profile of the system S: e.g., how often does it manifest 
the undesirable behaviors, how reliable is its track record, etc. Testing the trust 
hypothesis 9 should tell us whether to stick with it, or replace it with another 
trust statement. 

In this section, we assume that the trust hypothesis Pr§ is given: e.g. from 
the records of past behaviors. The statistic / presents a new record, capturing 
recent behaviors. The task is to align the two. The problem of formulating Prg 
will be discussed in the next section. 

3.1 Significance testing of trust 

For simplicity, assume that the system S has just 4 observable behaviors, col¬ 
lected in the set B = {a, b, c, d}. To be trustworthy, the system should manifest 
the acceptable behavior a at least 98% of time. It may block 6, or crash c for 
.5% of the time, and it may delay d its functioning for 1% of the time. So 
we postulate the null hypothesis that the system S behaves according to the 
probability distribution Pr 0 : {a, b , c, d} —» [0,1] displayed on Table 13.11 For 


B 

a 

b 

c 

d 

Pro 

.98 

.005 

.005 

.01 


Table 1: Trustworthy behavior 


even more simplicity, assume that we observe just one of the events from the set 
{a, b 1 c, d}. This means that the statistic / : {a, b , c, d} —> R will have the value 
1 for one event, and 0 for the rest. Should we continue to trust the system S ? 

In statistics, the answer to this question is reduced to determining whether 
the sample represented by the statistic / is significant enough to reject the null 
hypothesis (which was in our case that the system S was trustworthy). The idea 
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of statistical significance testing is that the observation / is significant enough 
to reject the null hypothesis just when the observation / is extremely unlikely 
according to the null hypothesis. So we could fix a very small number a > 0 
and say that the null hypothesis should be rejected if x is observed such that 

Pr o(/0) = l) < a (1) 

Since the times before computers, the scientists got in the habit of tabulating 
and using a = 5% and a = 1%. So if we use a = 1% and observe b or c, we 
would have to reject the null hypothesis, and stop trusting the system S; and if 
we observe a or d we could continue to trust it. 

But to not oversimplify things, we should mention that already the founder 
of statistics, Ronald Fisher, argued in 11 that a test should be considered 
significant and the null hypothesis rejected only when 

Pr °( y ) < a ( 2 ) 

Pro (y)<P 


where P = Pr 0 (/(x) = l) for the observed event x. In words, the total proba¬ 
bility of all events y that are at least as unlikely as the observed event x should 
be less than a. The left-hand side of © is the p-value of the observation / 
under the hypothesis Pro- The p-value of both b and c is now .1, and the null 
hypothesis is never rejected. The p- values for a and d are 1 and .2 respectively. 
Remark. It should be noted here that significance testing is a typical embod¬ 
iment of the negative logics of scientific induction: a test is only significant if 
it disproves the null hypothesis. This aspect of inductive logic is similar to the 
proof by contradiction in deductive logic; but it is different from deductive logic 
in that this is the only inductive proof schema, while deductive logic also has 
the positive schemas. This logical constraint is just what makes inductive logic 
and the scientific methodologies built upon it, suitable for the reasoning about 
security and trust, as it echoes the fact that they can always be broken, and 
cannot be assured by logics. 

3.2 Powerful testing of trust 

While the significance testing allows rejecting the null hypothesis when signifi¬ 
cant tests are found, it does not allow drawing any conclusions about the null 
hypothesis when it is not rejected, and no conclusions about the other hypothe¬ 
ses when the null hypothesis is rejected. The testing method devised by Neyman 
and Pearson [221 EH] considers two competing hypotheses Prg : B — » [0,1], for 
9 £ {0,1}, and maximizes the probability that the null hypothesis 9 = 0 is re¬ 
jected when the alternate hypothesis 9 = 1 happens to be true. This probability 
is called the power of a test. 

It is assumed that the null hypothesis 0 = 0, claiming that the observed 
sample will be distributed according to Pr 0 : B —»• [0,1], is the one that is 
currently accepted, whereas the alternate hypothesis 0 = 1, claiming that the 
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observations will be distributed according to Pri : B —> [0,1], will gain valid¬ 
ity if the test turns out to be significant and rejects the null hypothesis. For 
instance, when a scientist hypothesizes that a phenomenon A is the cause of 
the phenomenon B , then the null hypothesis is usually taken to be the claim 
that the phenomenon B is not correlated to A, whereas the alternate hypothesis 
is the claim A and B are correlated. When a judge needs to decide whether 
the accused A has committed a crime B , then the null hypothesis is that A is 
innocent with respect to B 1 whereas the alternate hypothesis is that A is guilty 
of B. 

To continue with the example from Sec. 13.11 now consider the two hypothetic 
distributions of the behaviors in the system S displayed in Table HT~?1 In the last 
line of the table is the likelihood ratio Neyman and Pearson KT use the 

likelihood ratio to decide when to reject the null hypothesis 9 = 0 in favor of 
the alternative hypothesis 9 = 1. For this purpose, they introduce the decision 


B 

a 

b 

c 

d 

Pro 

.98 

.005 

.005 

.01 

Pri 

.098 

.001 

.001 

.9 

Pri(x) 

Pr 0 (x) 

.1 

.2 

.2 

90 


Table 2: Trustworthy vs untrustworthy behavior 


thresholds a and 0, displayed in Table 13.21 which define the error probabilities 
as follows 

• a is the probability that the null hypothesis is rejected when it is true, 
whereas 

• /3 is the probability that the null hypothesis is not rejected when it is false. 



reality 

9 = 1 

9 = 0 

decision 

9 = 1 

true 

1 — a confidence 

false negative 

0 = Pr(0|l) 

9 = 0 

false positive 
a = Pr(l 0) 

true 

1—0 strength 


Table 3: Decision thresholds a and 0 


Since the rejection of the null hypothesis is conventionally viewed as the positive 
outcome a statistical test, the first type of error is called a false positive decision, 
whereas the second type of error is called a false negative. E.g. in the court of 
law, sentencing an innocent person is a false positive, and letting a guilty person 
free is a false negative, since the null hypothesis is that the accused is innocent, 
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and the burden of proof towards rejecting this hypothesis is on the prosecution. 
In a fire alarm system, the null hypothesis is that there is no fire, and the 
false positive is when the alarm rings without fire, whereas a false negative is 
when the alarm does not ring when there is fire. It is generally accepted as 
worse to have false positives, since they lead to switching off the fire alarms, 
rejecting the entire testing frameworks, and thus impelling the negatives as the 
only outcomes. Neyman and Pearson therefore design the testing frameworks 
where the upper bound a of the false positive decisions is chosen by the tester, 
and then the upper bound for the false negative decisions is minimized. The 
power of a test is defined to be the probability 1 — /3 that the null hypothesis is 
rejected when it is really false. The Neyman-Pearson Lemma m says that the 
maximally powerful test is given by the decision rule that the null hypothesis 
of innocence should be rejected whenever the likelihood of guilt is 


L(x) 


Pri(a;) 
Pr 0 (a;) 


> rj 


(3) 


where the threshold rj is such that the chance of false positives is 


Pr(L(a;) > rj \ 9 = 0) = a (4) 

The claim that J3]) gives the most powerful test means that if the chance a in 
(??) is fixed, then /3 in 

Pr ( L(x ) < i] \ 9 = 1) = (3 (5) 

is minimal for the fixed when L(x) = Recall that a is the chance 

that an innocent subject is found guilty, whereas /3 is the chance that a guilty 
subject is found innocent. Going back to the trust test from Sec. 13.11 where 
/ : {a, b , c, d} —> R captured the observation of a single system event, the 
Neyman-Pearson powerful testing would reject the null hypothesis 9 = 0 in 
favor of the alternative hypothesis 6 = 1 at the level a = 1% only if the event d 

is observed, and otherwise fails to obtain a significant result. This means that 

we should only reject the trust hypothesis 9 = 0 and endorse the hypothesis 
9 = 1 that the system S is not trustworthy if the observed delays d amount to 
more than 1% of the sampled performance time. Crashing or blocking .5% of 
the time should not trigger our distrust. 

Note that the threshold a = 1%, imposed in the powerful testing as the 
upper bound of the false positives, has eliminated the significance of the ob¬ 
servations b and c, which were significant enough to cause the rejection of the 
null hypothesis at the same threshold level a = 1% in Sec. 13.11 On the other 
hand, the minimization of the false negatives in the powerful testing has now 
introduced the observation d as significant, which it was not the significance 
testing. The two testing approaches thus implement two incomparable views of 
trust. It seems worth while to further explore which one might be more suitable 
for which application domains. 

Although the powerful testing allows comparing pairs of hypotheses (albeit 
in essentially asymmetric roles of the null hypothesis and its alternative!), it 
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actually provides little help in selecting between multiple alternative hypotheses. 
The best we can do with powerful testing in such situations is to test the null 
hypothesis against each of the candidate alternatives. However, such approaches 
lead to pathological situations, where the hypothesis 0 is rejected against 1, 1 
against 2, and 2 against 1. Similar phenomena arise when the same significance 
test is applied to several hypotheses, in the hope that some will be rejected and 
some not. Overcoming such difficulties requires randomized sampling, Bayesian 
reasoning, and controlled experiments. 


3.3 Experimental testing of trust 


If I know an overall probability Pr(0) that a system similar to S might be 
trustworthy, and Pr(l) = 1 — Pr(0) that it might not be trustworthy, then I 
could derive the probability Pr(0|:r) that the system S is trustworthy after the 
observed behavior x £ B using the Bayes’ law: 


Pr(0|s) 


Pr 0 (x) Pr(0) 

Pi'o(x) Pr(0) + Pri(a;) Pr(l) 


( 6 ) 


If there are several hypotheses 8 £ O = {0,1,2about the behavioral 
profiles of the systems, then I can calculate the probability of each of them after 
the observation x £ B by the general formula 


Pr(0|*) 


Prg(:r) Pr(0) 
Epee Pr b 0*0 Pr W0 


(7) 


However, the only way to control the distribution Pr : 0 —> [0,1] of the trust 
profiles of a population of systems to which S belongs is to model this population 
in the experimental environment of a laboratory, where I could control that the 
sample is distributed according to Pr : 0 —> [0,1]. Sampling the behaviors of 
the system S in this controlled environment would then allow me to calculate 
Pr(0|cc) according to © for all profiles 8 £ 0, and to select the most likely 
profile 8 = 0 G 0 as my current trust hypothesis about S. 

But even this experimental environment, where I can impose the prior prob¬ 
ability Pr : 0 —>• [0,1] by controlling the sample, does not give me the prior 
probabilities Pr# : B —> [0,1], which express the trust hypotheses to be tested. 
Where do they come from? 


4 Formulating trust hypotheses 

How exactly should I find the trust hypotheses suitable for testing? How should 
I select the most important ones? 

4.1 The scientific presumption of innocence 

Both the scientific methodology and the sound legal practices suggest that the 
null hypothesis should be that the system is trustworthy, i.e. ’’innocent un¬ 
til proven guilty” [35]. The alternate hypotheses should describe the various 
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forms of undesired behavior, which the tested sample might uncover if the null 
hypothesis is rejected. 

If I know the statistical profile of the desired normal behavior of a system, 
then I should take that profile as the null hypothesis Pro : B —> [0,1]. But it 
is usually difficult to specify the desired normal behavior as a single profile. It 
is much easier to characterize each of the abnormal behaviors, which we learn 
from the anomalies experienced in the past. That is why the statistical intru¬ 
sion detection systems Emm and forensics mostly work with the statistical 
profiles of intruders and criminals, and test these profiles as the null hypotheses. 

The problem with this ’’guilty until proven innocent” approach is not just 
that it is unfair in court. A greater problem arises from the logical limitation 
of inductive inference: that the null hypothesis can never be proved by a finite 
number of tests, but can only be disproved. By testing the profiles of guilt on the 
given samples of behaviors, we can never demonstrate anyone’s guilt; we can 
only fail to disprove it. The consequence in the realm of security is that the trust 
based on testing and rejecting every known form of undesirable behavior is not 
only impractical, but also the weakest possible form of trust. All that you know 
is that no guilt has been proven yet. The complexity and the ineffectiveness 
of this method is illustrated time and again by the complexity and the ineffec¬ 
tiveness of the vetting procedures, which often admit untrustworthy subjects, 
while regularly rejecting trustworthy subjects. Scientifically based trust, based 
on testing the null hypothesis that the subject is trustworthy, would obviously 
be simpler and more effective, both because it allows sound statistical controls 
of the false positives and the false negatives, and also because it eliminates 
not only the known anomalies, but all anomalies that are inconsistent with the 
normal behavior profile described by the null hypothesis. 

But where can I find the statistical profile Pro : B —> [0,1] characterizing 
the trustworthy behavior of the system <S? I could log the normal functioning 
of the system for a long time; but which observable system events B yield the 
relevant observations? 

The first limitation of scientific induction, discussed so far, is that it never 
proves, but only disproves its hypotheses. Here we confront its second limitation: 
the null hypotheses cannot be extracted from the empiric data, but always have 
to be formulated a priori. 

4.2 Compressing trust 

The problem of formulating a priori hypotheses was discussed in philosophy 
of science several centuries ago, but remained unsolved. The path towards the 
modern solutions was opened by Ray Solomonoff [39], and cleared by Andrei 
Kolmogorov m and his school. The versions suitable for practical applications 
in machine learning and in statistics were developed by Jorma Rissanen [37], 
Chris Wallace [40], and many others. Very roughly, the idea is as follows. 

Continuing with the notation from Sec. 13.31 we still denote the set of hypothe¬ 
ses by 0. The problem is that we do not know the probabilities Pr@ : B —>• [0,1]. 


13 


We are, however, given a sufficiently large data sample, from which we extract 
the frequency distribution Pr : B —> [0,1] of each observation. 

The task is now to find a hypothesis 6 = 9q £ 0 such that Pro : B —> [0,1] 
maximizes the conditional probability Pr(0o|^) in (0) when the behavior 
is observed. Since 


Pr(ir) = H Pr*(®)Pr(V0 (8) 

■0e0 


the Bayes’ formula 0 now becomes 


Pr(0|a;) 


Pre(a:) Pr(0) 
Pr(x) 


(9) 


The null hypothesis 9 0 gives the probability distribution Pr 0 : B —>• [0,1] such 
that for the observed x holds Pr(0 o |-'c) > Pr(0|a;) for all 9 € 0. Since the 
probability Pr(a;) is given by the observed data, the task only depends on the 
unknown hypotheses 9 £ ©. 

The idea used by Solomonoff, Kolmogorov and others is to apply Occam’s 
razor here, and to postulate that the simplest hypotheses have the highest a 
priori probability. The idea is implemented by taking into account the lengths 
of the descriptions of the probabilities in (|9|) . Using the optimal Shannon-Fano 
encodings [8], we can write a number p £ [0,1] using — logp bits. The task of 
maximizing @ now becomes the task of minimizing 


— logPr(6 | |a;) = — logPr^a:) — logPr(0) + logPr(x) 


Since Pr(x) is fixed, this means that 


9 0 = argminj— logPr^x) — logPr(0)} (10) 

8ee 

This is equivalent to Pr 0 (a;) • Pr(0 o ) > Prg(x) • Pr(0) for all 9 € ©, which picks 
@o to maximize the chance that x is observed. This is what makes 9q the best 
a priori null hypothesis. The minimality of the description length — logPr($o) 
means that 9q is the simplest. The minimality of — log Pro (a:), or equivalently 
the maximality Pro(x), means that x is the most likely prediction of 9q. The 
minimality of — logPr 0 (a:) — logPr(0 o ) means that 9 0 is the simplest hypothesis 
among those that predict x. 

Instantiated to the realm of trust, dll thus says that the best trust hypoth¬ 
esis is the one that provides the shortest description of my notion of trust, which 
fits the observations that I have made. 

The rapidly expanding research area of algorithmic learning and statistical 
inference is concerned not only with the effective computations of the a priori 
hypotheses, but also with the situations where the succinct descriptions of the 
data and the hypotheses need to be combined with empiric data. The right- 
hand side of (HOD is roughly Rissanen’s Minimum Description Length (MDL) 
m of the distribution of the observed data x. Wallace’s Minimum Message 
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Length (MML) [37] differs in the compression methods used. Kolmogorov’s 
minimal sufficient statistic m uses the optimal computable encodings as the 
compression method. The standard compression algorithms, e.g. based on the 
very efficient Lempel-Ziv algorithms mm are also often used, and give rea¬ 
sonable results. In any case, the best null hypothesis is the one which best 
compresses the observed data x , within some given family of compression al¬ 
gorithms. The underlying idea is that the better we understand the data, the 
better we compress them. 

Although these methods give somewhat degenerate results when applied to 
our toy examples from Sec. [3] just slightly larger trust hypotheses show the 
intuitive meaning of (fTUl) in the realm of trust. My trust hypothesis should be 
the simplest description of the desired behaviors which best approximates the 
observed behaviors of the tested system. 

5 Background and future work 

The main claim of this paper is that the methods of statistical inference, on 
which modern science has been built, can be used to analyze and secure trust. 
We close the paper relating this idea with the general context of trust research, 
and in particular with the existing application of statistical methods to trust 
testing in the framework of intrusion detection. 

The literature about trust is very extensive, as it is studied in psychology, 
social sciences, economics, game theory BUS [23]. Even within the closely re¬ 
lated security research communities, the word ’trust’ is used in several different 
meanings ETH]. The notion of trust used in this paper is based on [33]. 

A quantitative analysis of the process of trust building was initiated in [3T|. 
The question of trust decisions was, however, avoided by reducing them to the 
preferences extracted from the trust ratings. The question of trust measure¬ 
ments was avoided by reducing them to user ratings and feedback, which are 
usually available in web commerce, but not in general. In system security, the 
task of quantifying security in general and trust in particular becomes a problem 
dug. In the present paper, we did not consider the problem of quantifying 
trust a posteriori , i.e. using the measurements of the past performance, but 
focused on the harder problem of formulating the trust hypotheses a priori , i.e. 
before any empiric data are available. This problem arises even if the satis¬ 
factory methods for quantifying trust and security a posteriori are available, 
because the data are not always available. On the other hand, understanding 
how to express the a priori trust beliefs may also help in devising and validating 
the methods to quantify them a posteriori. 

The idea of statistical intrusion detection , going back to Dorothy Denning 
m and her work with Peter Neumann at SRI in the 80s, can be viewed as an 
application of statistics to detect the subjects or the components that are not 
trustworthy. An early survey is [24] , The practices of intrusion detection have 
evolved a lot since those early days, and the rule based methods seem to have 
found broader applications than the statistical methods. One of the reasons 
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often mentioned is the difficulty to control the false positives that arise when 
statistical tests are used to detect the intruders. We explained in Sec. 14.11 why 
the statistical methodologies suggest that trust testing should be based on taking 
a trustworthy behavior as the null hypothesis, and why testing for anomalies 
and the untrustworthy behaviors leads to the false positives that are harder to 
control, and to less reliable results overall. In statistics, proving that someone 
is not trustworthy is not equivalent to disproving that they are trustworthy. 
The general method for controlling the false positives when disproving trust 
is outlined in Sec. IQ The false positives thus emerge as a hard problem in 
statistical intrusion detection because it tests for the intrusions, and not for 
trust. The reason is, of course, that the intruder profiles are much easier to 
come by than the trustworthy profiles. In the Sec. [2 we discussed the way to 
solve this problem using the methods of algorithmic learning. Whether that 
brief discussion explained or obscured the idea, there is very little doubt that at 
least a theoretical solution lies in this direction. But the practical work towards 
implementing such computation-based scientific methodologies on the concrete 
problems of trust lies ahead. 
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